Recently, diffusion frameworks have achieved comparable performance with previous state-of-the-art image generation models. Researchers are curious about its variants in discriminative tasks because of its powerful noise-to-image denoising pipeline. This paper proposes DiffusionInst, a novel framework that represents instances as instance-aware filters and formulates instance segmentation as a noise-to-filter denoising process. The model is trained to reverse the noisy groundtruth without any inductive bias from RPN. During inference, it takes a randomly generated filter as input and outputs mask in one-step or multi-step denoising. Extensive experimental results on COCO and LVIS show that DiffusionInst achieves competitive performance compared to existing instance segmentation models. We hope our work could serve as a simple yet effective baseline, which could inspire designing more efficient diffusion frameworks for challenging discriminative tasks. Our code is available in https://github.com/chenhaoxing/DiffusionInst.
translated by 谷歌翻译
模拟和混合信号(AMS)电路设计仍然依赖于人类设计专业知识。机器学习一直通过用人工智能代替人类的体验来协助电路设计自动化。本文介绍了标签,这是一种从利用文本,自我注意力和图形的布局中学习电路表示的新范式。嵌入网络模型在无手动标签的情况下学习空间信息。我们向AMS电路学习介绍文本嵌入和自我注意的机制。实验结果表明,具有工业罚款技术基准的实例之间的布局距离的能力。通过在案例研究中显示有限数据的其他三个学习任务的转移性,可以验证电路表示的有效性:布局匹配预测,线长度估计和净寄生电容预测。
translated by 谷歌翻译
少数拍摄识别旨在在低数据制度下识别新型类别。由于图像的稀缺性,机器不能获得足够的有效信息,并且模型的泛化能力极弱。通过使用辅助语义模式​​,基于最近的公制学习的少量学习方法已经取得了有希望的表现。但是,这些方法仅增强了支持类的表示,而查询图像没有语义模态信息以增强表示。相反,我们提出了属性形状的学习(ASL),其可以将可视化表示标准化以预测查询图像的属性。我们进一步设计了一个属性 - 视觉注意力模块(Avam),它利用属性来生成更多辨别特征。我们的方法使视觉表示能够专注于具有属性指导的重要区域。实验表明,我们的方法可以在幼崽和太阳基准上实现竞争结果。我们的代码可用于{https://github.com/chenhaoxing/asl}。
translated by 谷歌翻译
从有限的数据学习是一个具有挑战性的任务,因为数据的稀缺导致训练型模型的较差。经典的全局汇总表示可能会失去有用的本地信息。最近,许多射击学习方法通​​过使用深度描述符和学习像素级度量来解决这一挑战。但是,使用深描述符作为特征表示可能丢失图像的上下文信息。这些方法中的大多数方法独立地处理支持集中的每个类,这不能充分利用鉴别性信息和特定于特定的嵌入。在本文中,我们提出了一种名为稀疏空间变压器(SSFormers)的新型变压器的神经网络架构,可以找到任务相关的功能并抑制任务无关的功能。具体地,我们首先将每个输入图像划分为不同大小的几个图像斑块,以获得密集的局部特征。这些功能在表达本地信息时保留上下文信息。然后,提出了一种稀疏的空间变压器层以在查询图像和整个支持集之间找到空间对应关系,以选择任务相关的图像斑块并抑制任务 - 无关的图像斑块。最后,我们建议使用图像补丁匹配模块来计算密集的本地表示之间的距离,从而确定查询图像属于支持集中的哪个类别。广泛的少量学习基准测试表明,我们的方法实现了最先进的性能。
translated by 谷歌翻译
少量学习致力于在少数样品上培训模型。这些方法中的大多数基于像素级或全局级别特征表示学习模型。但是,使用全局功能可能会丢失本地信息,并且使用像素级别功能可能会丢失图像的上下文语义。此外,这些作品只能在单个级别上衡量它们之间的关系,这并不全面而有效。如果查询图像可以通过三个不同的水平相似度量同时分类很好,则类内的查询图像可以在较小的特征空间中更紧密地分布,产生更多辨别特征映射。由此激励,我们提出了一种新的零件级别嵌入适应图形(PEAG)方法来生成特定于任务特征。此外,提出了一种多级度量学习(MML)方法,其不仅可以计算像素级相似度,而且还考虑了部分级别特征和全局级别特征的相似性。对流行的少量图像识别数据集进行了广泛的实验,证明了与最先进的方法相比的方法的有效性。我们的代码可用于\ url {https:/github.com/chenhaoxing/m2l}。
translated by 谷歌翻译
已经对机器学习技术进行了广泛的研究,以实现掩盖优化问题,旨在提高掩模的可打印性,更短的周转时间,更好的遮罩制造性等。但是,这些研究中的大多数都集中在小型设计区域的初始解决方案生成上。为了进一步实现机器学习技术在面罩优化任务上的潜力,我们提出了一个卷积傅立叶神经操作员(CFNO),该神经操作员(CFNO)可以有效地学习布局瓷砖依赖性,从而有望使用有限的遗产工具干预,并有望使用无针迹的大规模掩蔽优化。我们在解决非凸优化问题时通过训练有素的机器学习模型发现了岩石引导的自我训练(LGST)的可能性,从而允许迭代模型和数据集更新并带来显着的模型性能改进。实验结果表明,我们基于机器学习的框架首次优于最先进的学术数值掩码优化器,并具有速度级的速度。
translated by 谷歌翻译
Deep learning models can achieve high accuracy when trained on large amounts of labeled data. However, real-world scenarios often involve several challenges: Training data may become available in installments, may originate from multiple different domains, and may not contain labels for training. Certain settings, for instance medical applications, often involve further restrictions that prohibit retention of previously seen data due to privacy regulations. In this work, to address such challenges, we study unsupervised segmentation in continual learning scenarios that involve domain shift. To that end, we introduce GarDA (Generative Appearance Replay for continual Domain Adaptation), a generative-replay based approach that can adapt a segmentation model sequentially to new domains with unlabeled data. In contrast to single-step unsupervised domain adaptation (UDA), continual adaptation to a sequence of domains enables leveraging and consolidation of information from multiple domains. Unlike previous approaches in incremental UDA, our method does not require access to previously seen data, making it applicable in many practical scenarios. We evaluate GarDA on two datasets with different organs and modalities, where it substantially outperforms existing techniques.
translated by 谷歌翻译
The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.
translated by 谷歌翻译
As one of the prevalent methods to achieve automation systems, Imitation Learning (IL) presents a promising performance in a wide range of domains. However, despite the considerable improvement in policy performance, the corresponding research on the explainability of IL models is still limited. Inspired by the recent approaches in explainable artificial intelligence methods, we proposed a model-agnostic explaining framework for IL models called R2RISE. R2RISE aims to explain the overall policy performance with respect to the frames in demonstrations. It iteratively retrains the black-box IL model from the randomized masked demonstrations and uses the conventional evaluation outcome environment returns as the coefficient to build an importance map. We also conducted experiments to investigate three major questions concerning frames' importance equality, the effectiveness of the importance map, and connections between importance maps from different IL models. The result shows that R2RISE successfully distinguishes important frames from the demonstrations.
translated by 谷歌翻译
Compressed videos often exhibit visually annoying artifacts, known as Perceivable Encoding Artifacts (PEAs), which dramatically degrade video visual quality. Subjective and objective measures capable of identifying and quantifying various types of PEAs are critical in improving visual quality. In this paper, we investigate the influence of four spatial PEAs (i.e. blurring, blocking, bleeding, and ringing) and two temporal PEAs (i.e. flickering and floating) on video quality. For spatial artifacts, we propose a visual saliency model with a low computational cost and higher consistency with human visual perception. In terms of temporal artifacts, self-attention based TimeSFormer is improved to detect temporal artifacts. Based on the six types of PEAs, a quality metric called Saliency-Aware Spatio-Temporal Artifacts Measurement (SSTAM) is proposed. Experimental results demonstrate that the proposed method outperforms state-of-the-art metrics. We believe that SSTAM will be beneficial for optimizing video coding techniques.
translated by 谷歌翻译